A Comparison of Merging Strategies for Translation of German Compounds
نویسنده
چکیده
In this article, compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training, and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources, such as word lists, and internal sources that are carried through the translation process, such as symbols or parts-of-speech. I show that for merging to be successful, some internal knowledge source is needed. I also show that an extra sequence model for part-ofspeech is useful in order to improve the order of compound parts in the output. The best merging results are achieved by a matching scheme for part-of-speech tags.
منابع مشابه
Compound Processing for Phrase-Based Statistical Machine Translation
In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to ...
متن کاملThe comparison of contact toxicity of three formulations of lambda-cyhalothrin against German cockroach adults
Introduction: The German cockroach, Blattella germanica (L.), is one of the most serious household insect pests. Current control strategies rely heavily upon application of various formulations of insecticides. The purpose of the present study was to test the short-term effects of formulation of capsule suspension in comparison with formulations of wet table powder and emulsifiable concentrate ...
متن کاملO-3: Drug Repositioning by Merging Gene Expression Data Analysis and Cheminformatics Target Prediction Approaches
The transcriptional responses of drug treatments combined with a protein target prediction algorithm was utilised to associate compounds to biological genomic space. This enabled us to predict efficacy of compounds in cMap and LINCS against 181 databases of diseases extracted from GEO. 18/30 of top drugs predicted for leukemia (e.g. Leflunomide and Etoposide) and breast cancer (e.g. Tamoxifen a...
متن کاملGerman Compounds and Statistical Machine Translation. Can they get along?
This paper reports different experiments created to study the impact of using linguistics to preprocess German compounds prior to translation in Statistical Machine Translation (SMT). Compounds are a known challenge both in Machine Translation (MT) and Translation in general as well as in other Natural Language Processing (NLP) applications. In the case of SMT, German compounds are split into t...
متن کاملEffects of sub-inhibitory concentrations of German chamomile (Matricaria recotita) extracts on the activity of catalase enzyme of S. aureus
Background: German chamomile (Matricaria recotita) as a medicinal plant has several therapeutic effects. Since the extract of this plant contains a high amount of α-bisabolol (Terpenoid) its antibacterial properties can be considered. Staphylococcus aureus as a pathogenic bacterium is important in clinical situations and food hygiene. So, investigation of effects of antibacterial compounds agai...
متن کامل